The signal-to-noise analysis of the 
Little-Hopfield model revisited 



D. Bolle, J. Busquets Blanco and T. Verbeiren 
Abstract 

Using the generating functional analysis an exact recursion rela- 
tion is derived for the time evolution of the effective local field of the 
fully connected Little-Hopfield model. It is shown that, by leaving 
out the feedback correlations arising from earlier times in this effective 
dynamics, one precisely finds the recursion relations usually employed 
in the signal-to-noise approach. The consequences of this approxima- 
tion as well as the physics behind it are discussed. In particular, it is 
pointed out why it is hard to notice the effects, especially for model 
parameters corresponding to retrieval. Numerical simulations confirm 
these findings. The signal-to-noise analysis is then extended to include 
all correlations, making it a full theory for dynamics at the level of 
the generating functional analysis. The results are applied to the fre- 
quently employed extremely diluted (a)symmetric architectures and to 
sequence processing networks. 

1 Introduction 

During the last number of years, the treatment of dynamics using generat- 
ing functional techniques (GFA) has received a lot of attention in the field 
of statistical mechanics of disordered systems, in particular neural networks 
(see, e.g., and references therein). Such a treatment allows for the 

exact solution of the dynamics and finds all relevant physical order param- 
eters at any time step via the derivatives of a generating functional [3]~|H]- 
An alternative method to study dynamics of neural networks is the so called 
signal-to- noise analysis (SNA) or statistical neurodynamics (see, e.g., [H]- 
|12j ) where the idea is to start from a splitting of the local field into a signal 
part originating from the pattern to be retrieved and a noise part arising 
from the other patterns. The differences in the existing versions of this ap- 
proach consist out of the different treatments of this noise term, ranging 



from the assumption that it is Gaussian with various approximations for its 
variance [HIEIEI to a supposedly exact treatment [TH [T3] . 

In two recent papers some comparisons have been made between the 
two methods for a sequence processing network respectively the fully 
connected Blume-Emery-Griffiths network P]. For the first system, sur- 
prisingly, the order parameter equations obtained through the exact GFA 
solution are shown to be completely equivalent to those of statistical neu- 
rodynamics, known to be an approximation that assumes Gaussian noise. 
These theoretical results are verified by computer simulations. We recall 
that this system contains no feedback correlations. For the second system 
that is fully connected and, hence, does contain feedback correlations, it 
has been shown that the results of the GFA and SNA coincide up to the 
third time step. Some numerical experiments then indicated that they may 
differ for further time steps, certainly for those parameters of the system 
corresponding to spin-glass behaviour. 

The idea of the present work is to perform a systematic analytical study 
of the relationship between both techniques using the fully connected Little- 
Hopfield model. In order to do so, we take the GFA one step further by 
deriving a recursion relation for the effective local field. To our knowledge 
such a recursion relation has not yet been reported upon in the literature. 
Furthermore, it is precisely this relation that we use as basis for studying 
the correspondence between both methods. 

For the fully connected model we show that the SNA as it has been 
applied up to now in fact approximates the exact dynamics because it forgets 
about a part of the correlations. We discuss the physics behind such a short- 
memory approximation and explain why it leads to very good results in the 
case of retrieval. Moreover, we show how to apply the SNA correctly leading 
to a complete equivalence with the GFA. The results obtained are applied 
to other architectures and to sequence processing networks. 

The paper is organized as follows. In Section 2 we recall the fully con- 
nected Little-Hopfield model and discuss the SNA approach to solve its 
dynamics. Section 3 shortly reviews the GFA and derives recursion rela- 
tions for the effective local field in this framework. Section 4 introduces the 
short-memory approximation and shows how this reduces the results of the 
GFA to those of the SNA. It also explains the physics behind this approx- 
imation and discusses some numerical results. Section 5 presents a scheme 
how to apply exactly the SNA. In section 6 we apply our findings to the 
extremely diluted symmetric and asymmetric architectures and to sequence 
processing models. Finally, Section 7 contains some concluding remarks. 
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2 Signal-to-noise analysis of the Little-Hopfield model 



2.1 The Little-Hopfield model 

By now, the Little-Hopfield model is a standard model for associative mem- 
ory and can be found in many textbooks (see, e.g, ^U). Consider a system 
of Ising spins crj, i = 1,...,N. We want to store p = aN patterns 
= {^j^, . . . , ^^}, /i = independent and identically distributed 

with respect to i and ^. The local field in neuron i is defined by 

N 

h^{t) = Y,Jij<^jit), (1) 
with the couplings given by the Hebb rule 

All neurons are updated in parallel according to the Glauber dynamics 

^f3shi{t) 

+ " = ''l'"'"l- 2cosh(g/..W) ' 

which becomes, in the limit /3 = 1/T oo, equivalent to the gain function 
formulation 

ai{t + 1) = sgn{hi{t)) . (4) 

In general, we write 

ai{t + l)=g{hi{t)). (5) 
The long time behaviour of the system is governed by the Hamiltonian 

m 

H = ~Y.^n[coshif3h,m. (6) 

^ i 

The thermodynamic and retrieval properties are visualized in the (T — a) 
phase diagram ^7j- There is a transition curve starting at T = 0, a = 0.138 
and ending at T = 1, a = beyond which the system does not retrieve any 
patterns anymore and behaves like a spin-glass. For higher temperatures 
the system undergoes a transition to the paramagnetic phase. 

The parallel dynamics of this model using both the SNA and GFA and, 
especially, the comparison between these approaches is the subject of the 
following sections. 
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2.2 Signal-to-noise analysis 

In order to keep this paper self-contained we shortly review the signal-to- 
noise analysis. We assume that the system has an initial finite overlap with 
one of the patterns, say the first one, which we call condensed. The other 
patterns (non-condensed) act as noise, making it harder for the network to 
retrieve the condensed pattern. 

We first focus on zero temperature. The key idea is to separate the signal 
from the noise in the local field 

h^{t) = ^4 2^ ^^^^'^'^ + ^ E E ^^^■(*) - (7) 

whereby it is technically convenient to include the self-interaction and sub- 
tract it again leading to the term aai{t). Quantities that have a site-index, 
or an index N are quantities for a finite system. We define the overlap, 
respectively the residual overlap by 

i j 
The local field Q then becomes 

Kit) = ilm^it) + ^efr^(f) - ■ (9) 

Our aim is to determine the form of the local field in the thermodynamic 
limit — > oo. To the signal term we apply the law of large numbers (LLN) 

lim m7v(t) =m(t) (10) 

where the convergence is in probability. Taking a closer look at the second 
term in Q, we could apply the central limit theorem (CLT) if all the terms in 
the sum would be independent. This, of course, depends on the architecture 
and it is true, e.g., for asymmetrically extremely diluted models |18|ll9j. In 
general, there are non-trivial correlations between the terms and the different 
implementations of the SNA mentioned before treat these correlations in a 
different way. 

For the fully connected architecture at hand, the most naive and simple 
approach 13 is to keep the assumption that all terms in the sum are un- 
correlated, and that one can simply apply the CLT theorem to the second 
term in Q. The local field becomes normally distributed with mean ^jm{t) 
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and variance a. This approach has then been refined in the theory of statis- 
tical neurodynamics There one also assumes that the noise part of the 
local field is normally distributed but one calculates explicitly its variance 
starting from its definition and taking into account part of the correlations 
between the embedded pattern and neuron states in the dynamics. This 
results in a more complex structure of the variance of the local field. For 
more technical details we refer to 0^5- Detailed simulations show that 
the assumption of Gaussian noise is approximately valid and close to real- 
ity as long as retrieval is successful. In particular, it succeeds in explaining 
qualitatively the dynamical behaviour of retrieval in the associative memory 
model. However, in addition to a different critical capacity at T = 0, the 
basin of attraction calculated in this scheme is larger than the one obtained 
by computer simulations jSHI . This is attributed to the fact that the correla- 
tions of the local fields at successive time steps are neglected. More specific, 
the average over the Gaussian distribution of the local field at time t -f- 1 
is taken independently of the average of the neuron state at time which 
clearly depends on the local field at time i — 1 . 

Later on these correlations between the local field at successive time 
steps (and therefore also correlations between neuron states at different time 
steps) have been taken into account. In this treatment one rewrites the 
noise part of the local field as the sum of two correlated terms to which one 
can apply the CLT but one keeps the Gaussian assumption throwing away 
possible non-Gaussian noise. For more details we refer to ^Uj. This further 
refined theory gives a better explanation of the dynamics and the basin of 
attraction. Moreover, the storage capacity resulting from this method is in 
good agreement with the results from computer simulations. 

However, we recall that the a priori Gaussian approximation is only valid 
when retrieval is successful. An improvement of the approximation is ob- 
tained by taking into account explicitly the feedback: the network state at 
time t -|- 1, crit -|- 1), depends on its previous states up to t — 1 namely 
(t(0), . . . , (T(t — 1). One proposal has been to assume two Gaussian peaks 
with variance calculated from the residual noise and separated by an ap- 
propriately chosen distance |2j but also here the correlations between the 
network states cr(t), cr{t — 1), cr(t — 2), . . . are only partially taken into ac- 
count. 

More recently, one has studied the distribution of the local field without 
any a priori assumption on the residual noise, trying to take into account 
all correlations of the different terms of the sum Q using the insight gained 
before jl41 115j (see also and references therein). 

In the treatment of the correlations between the variables appearing 
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in this expression for the local field , one has to be very careful, because 
even a small dependence of order C(l/\/iV) may give rise to a macroscopic 
contribution after summation. Therefore, we rewrite the local field 



h,{t) = hfit) + 



ill) 



where we have split of the part of hi{t) that depends strongly on , such 
that /i^ only depends weakly on Remark that the second term is small 
for large N, so 



ai{t + l) 



9{h^{t)) 



9 yKit) + 

9{ht{t)) + 



dg 
dhi 



Kit) 



+ 0{N-^). (12) 



Using this and the definition of the residual overlap in ((SJ , we get 

r%{t + l)^f%{t)+XN{t)r';,{t) 
where we have defined 



' N 



/,x I dg 



dhi 



(13) 



(14) 



Kit) 



Now, we take the limit — > oo. In this limit the density distributions of h 
and h are equal since they only differ upto a factor 1 / \/iV. Since h'^ (t) only 
depends weakly on we assume that in the limit we can use the CLT for 
the first term. To the second term in (|13j) . we apply the LLN arriving at 

r^'{t + l) = r''{t) + x{t)r''{t) (15) 

with f^{t) a normally distributed random variable M{0, 1) and 



xit) 



dg_ 
dh 



f fldh{s)P{hiO),h{l\...,h{t))^ 



Ht) I Jl \\ J s=o Ht) II 

(16) 

where the average denoted by is over the distribution of the local field 
with probability density P{h{0), h{l), . . . , h(t)) = P{h) and where ((•)) de- 
notes the average over the initial conditions and the condensed pattern. 
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Next, in order to find the distribution of the local field in the thermody- 
namic limit, we start from Q and use Q and (|13() 

+ x^{t)^Y.^>^r%{t)-aai{t + l). (17) 

The first term on the r.h.s., using (jlUj) is clear. In the second term, we can 
replace h^{t) by h^{t) leading to a contribution aai{t + 1). The third term 
is a sum of independent random variables and by the CLT it converges to 
AA(0,a). In the fourth term, we employ ©. From this we obtain, omitting 
the site index i 

h{t + 1) = C^m{t + 1) + x{t) [h{t) - £,^'m{t) + aa{t)\ + M{^, a) . (18) 

In this way we arrive at the recursion relation (|18|) for the local field and (|15)) 
for the residual overlap. We still want a recursion relation for the overlap 
by using the dynamics © starting from (jH} 

m{t) = ([{e9{h{t-l)))^)). (19) 

Finally, from (^Hl) it is clear that the local field consist out of a discrete 
part and a normally distributed part 

h{t + l) = M{t) +M{0,aD{t)) (20) 

where M{t) can be found by iterating the recursion relation for the local 
field, i.e. 

t-i /t-i \ 
Mit) = emit) + « E n ^(^') ^(^) • (21) 

s=0 \s'=s / 

The variance of the noise in (|^n|) can be calculated by using (fT3)) 

Var(r^(t + 1)) = D{t + 1) = 1 + xHt) D{t) + 2x{t) Cov(f'^(t), r^(t)) . (22) 

We still have to write out the probability density of the local field used to 
define x{t) in (|T6|l . The evolution equation tells us that a{t) can be replaced 
by g{h{t — 1)), such that the second term of M{t) in H20|) is the sum of step 
functions of correlated variables. These variables are also correlated with 
the Gaussian part of the local field through the dynamics. Therefore, the 
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local field can be seen as a transformation of a set of correlated Gaussian 
variables x{s) which we choose to normalize. Defining the correlation ma- 
trix by W{s,s'), s,s' ^ t — 1 we arrive at the following expression for this 
probability density 



This concludes the SNA treatment of the Little-Hopfield model at zero 
temperature. The above equations form a recursive scheme in order to 
calculate the dynamical properties of the system up to an arbitrary time 
step. The practical difficulty which remains, certainly after a few time steps 
is the explicit calculation of the correlations. 

It is possible to extend the method to arbitrary temperatures by intro- 
ducing auxiliary thermal fields to express the stochastic dynamics within 
the gain function formulation of the deterministic dynamics \22\ 



where the ^{t) are independent and identically distributed with probability 
density 



One then averages the zero temperature results over the auxiliary field 7(i). 
This alters the equations in a non-trivial way but such that the idea of the 
derivation can be completely retained. At this point we remark that in the 
derivation of the local field distribution one has to replace a{t) by g{h(t — l)) 
with h{t — 1) given by (|2U() . For arbitrary temperatures one has to replace 
a{t) by g{h(t — 1) + T'y{t — 1)) and the average over the 7(t) then enters at 
the same level as the average over the noise. For more details we refer to 
the literature [22j . 

2.3 Explicit results for the first four time steps 

In order to compare with the results of the GFA method to be explained in 
the following section it is useful to recall the SNA results for the first few 




s=0 




a{t+l)=g{h{t)+Tj{t)) 



(24) 




(25) 
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time steps as found in j2S| (and references therein). We only write down 
explicitly the expressions for the overlap m{t) at zero temperature 

m(l) = J Vzg{im{0) + ^aD{{))z)Jj (26) 

m(2) = ((e j Vzg{im{l) + ax(0)a(0) + ^aD{l)z)Jj (27) 

m(3) = (l^i j VW''^\x,y)g{im{2)+ax{l)[g{im{^) 

+VaZ)(0)x) + x(0)a(0)] + V«^(2)y))) (28) 

m(4) = (j^i j VW''^^^\x,y,z)g[im{Z) 

+ax(2)g(gm(l) +ax(0)( T(0) + \/aI)(l)j/) 
+ax(2)x(l)ff(em(0) + VaL'(O)x) 
+«X(2)X(1)X(0)CT(0) 



+V^D(3)z )). (29) 



Here T>z is the Gaussian measure with variable z while 'DW{xi^ . . . ,xt) is 
the multidimensional Gaussian measure with correlation matrix W as it 
appears in H23() . For the explicit form of the expressions for the variance 
of the residual overlap, the function x and the correlations we refer to the 
literature 



3 Generating functional approach to the Little- 
Hopfield model 

3.1 The effective local field 

The idea of the GFA aproach to study dynamics |S1 El El is to look at the 
probability to find a certain microscopic path in time. The basic tool to 
study the statistics of these paths is the generating functional 

N t 

m= E ^k(o),...,^(t)] nn^"''^'^'^"'^'^ (30) 

<T(0),...,<T(t) 1 = 1 S = 

with P[cr(0), . . . , cr{t)] the probability to have a certain path in phase space 

t-i 

P[a(0), . . . , ait)] = P[am n W[a{s + l)\a{s)] (31) 

s=0 
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and l^[<T|cr'] the transition probabilities from a' to a. Looking back at © 
and assuming parallel dynamics we have 



I3CT,{S + I)hi(s) 



(32) 



where we have introduced a time-dependent external field 6i{s) in order to 
define a response function. 

One can find all the relevant order parameters, i.e., the overlap m{t), the 
correlation function C(t, t') and the response function G{t, t'), by calculating 
appropriate derivatives of the above functional and letting ip = {ipi} tend 
to zero afterwards 

™w - 'J;-„^E«.f| (33) 

^<''*') - -J;-o^E,^|f|L. (35, 

In the thermodynamic limit one expects the physics of the problem to 
be independent of the quenched disorder and, therefore, one is interested in 
derivatives of -^["0] with the overline denoting the average over this disorder, 
i.e., all pattern realizations. This results in an effective single spin local field 
given by 

t-i 

h{t) = C m{t) + a ^ R{t, s) a{s) + Var]{t) (36) 

s=0 

with rj{t) temporally correlated noise with zero mean and correlation matrix 

D = {I -G)-'^C{I -G^)-^ (37) 
and the retarded self-interaction 

R = {I -G)-^G. (38) 

We refer to [H] for further details. The order parameters defined above can 
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be written as 



m{t) = {{{^a{t))J (39) 
Cis,s') = {{{ais)ais'))J (40) 



^ de{s' 

The average over the effective path measure is given by 

(/). = Tr|.(i),...,.(i)} J drj P{v) P{cT\rj) f , (42) 



where drj = Y\gr]{s) and with 

exp(/3a(s + l)h{s)) 



Is 

t-1 



P{cT\r,) = l[ 



s=0 



2cosh(/3/i(s)) 



t-1 



(43) 



Pir,) = ] exp -i Yl ^(«) ^"'(^' ^(^') • (44) 
Vdet(27r£>) \^ 2^^^ J 

The average denoted by the double brackets is again (as in the SNA) over 
the condensed pattern and initial conditions. We remark that G{s,s') = 
for s < s' and D{s, s') = D(s' , s), and that for all s < t 

G{t,s)=P{{{a{t)[a{s + l)-tanh{h{s))])J (45) 

where h{s) is given by (|36|) . 



3.2 An alternative stochastic description 

The GFA at arbitrary temperatures starts from the transition probabilities 
(|32j) following from the Glauber dynamics ©• In order to make the compar- 
ison with the SNA later on we want to introduce an alternative description 
starting from the deterministic dynamics as in (|24j) - (|25)) . 

Starting from H43|) and ()44() . taking the limit of zero temperature and 
introducing the auxiliary thermal field 7(t) both formulations are equiva- 
lent. Indeed, suppose that we want to calculate the effective path average of 
a general function / using the alternative description. Writing out the ex- 
pression for this average (the average over the condensed pattern and initial 
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conditions is not relevant here) 



= T^T^{aii),...Mt)} j d-f j dh j drj P(r/) 



t-1 

n 



S h{s) — ^m(s) — a R{s, s')a{s') — ^/ar]{s) — T^{s) 



-(l-tanh2 (7(s)))eKs + l)/i(s)) 



(46) 



where 7 is the thermal field. Since the latter only occurs inside the 5-function 
and the probability density, we can evaluate the integral over 7. Then the 
integration over the local field becomes 



t-i 



s=0 



Yl I dh{s)e{a{s + l)h{s)) 

1 — tanh^ h{s) — ^m{s) — a R{s, s')a{s') — \'7xr]{s) 



■ (47) 



This integral can be evaluated, and yields 



n 

s=0 



exp(/3o-(s + l)h{s)) 



2cosh(/3/i(s)) 



h{s)=^ m{s)+a Zs' R(s,s') a(s')+v^»?(t) 



(48) 



which is exactly the same as (|4,S|) showing that we can use both representa- 
tions for the dynamics. 

Using this alternative description for the thermal dynamics it is straight- 
forward to write the distribution of the local field as 



Pih) 



d-rdr,PiTj)Pi-r) 



t-i 



6{h{t) - ^ m{t) -a^ R{t, s) a{s) - \/a??(t) - T'y{t)) 



s=0 



(49) 



where we denote by |a-,h the substitutions a{s) = g{h{s — 1)) and h{s) = 
^ m{s) + Yl,s' -^('^1 '^{^') + + Tjis) for all s <t. The above discus- 

sion makes it trivial to perform the zero temperature limit in the GFA and 
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also shows that it is enough to look at the relation between both methods 
at zero temperature in order to be able to compare them in the whole phase 
diagram, because the way to extend the methods to finite temperature is 
completely equivalent. 



3.3 Recursive scheme 

The aim of this subsection is to derive recursion relations for the local field 
and also for the noise appearing in the local field starting from the GFA. In 
this way we want to gain more insight into the equations at hand and make 
a detailed comparison between the SNA and GFA. 

From expression H36|l for the effective local field, it is not immediately 
obvious how h{t) depends on h{s), s < t. First, we want an expression for 
the retarded self-interaction R as a function of the previous time step(s). 
To this end we write the response matrix at time {t + 1) in the following 
way 

- ( .1 ) '») 

where g^^i is the following vector of dimension t 

Qt+i = {G{t + 1, 0), G(t + 1, 1), . . . , G{t + 1, t)) (51) 

and Gt is the response matrix at time t. It is clear that adding a time step 
adds a row and a column to the matrix while it leaves the other elements 
unaltered. From the decomposition of the response matrix, we calculate 
Rt+i (see, e.g., EH) 

and use this expression in (jSHJ to arrive at 

t-i t 
h{t) = e m{t) + ^G{t, s) [h{s) - e m{s) + aa(s)] + J];(I-G)(t, s) r?(s) . 

This is one of the main results of this Section and will be used as the basis 
of comparison between the GFA and SNA approaches. It leads naturally to 
the definition of a modified noise variable 

t 

m = ^{I - G){t, s) Tiis) , = C{s, s') (53) 
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and it follows that the covariance matrix of the noises, B, is given by 



B{s,s') = {rj{s)<l,{s')) = Y,{I-G^){s\s'){r^{s)r^{s")) = [{I-Gr'C]{s, s') 

s" 

(54) 

The correlation matrix of this transformed noise variable is clearly simpler 
than the original one and is in fact the correlation matrix of the spins. 

Next, we derive recursion relations for the noise in a similar way. Anal- 
ogous to H50() we write 

Ct+i = ( ''V' ) (55) 

V ct+i 1 J 

where Ct-\-i is the following vector of dimension t 

ct+i = (C(t + 1, 0), Cit + 1,1),..., Cit + 1, t)) . (56) 
One then finds 

A Dtgl + {I-Gt)-'4^, 



D 



t+i 



QtDt + ct+i{I - gJ)-i 1 + QtDtgl + 2q+i(/ - Gty'gl 



(57) 

Again, going one time step further implies adding one row and column to 
the matrix while the rest of the matrix remains unchanged. From (|57|) we 
find a relation for the variance of the noise at time t + 1 



D{t + l,t + l) = 1+ Yl G{t + l,s)D{s,s')Git + l,s') 

s,s'=0 

t 

+2^G(t + l,s)S(s,t + l) (58) 

s=0 

as a function of the variance of the noise at time s < t and other quantities. 
Apparently, the right-hand side of this equation does still depend on t + 1 
and, therefore, does not seem to be really a recursion relation. However, as 
we will show, the quantities on the right-hand side can be calculated without 
having full information about time t + 1, e.g., G{t + l,s) can be obtained 
without knowing D{t + l,t + 1). 



14 



4 SNA versus GFA 



4.1 Short-memory approximation 

Comparing the expressions for the effective local field (|53() and (|18() we notice 
that the first one contains a sum over all time steps up to t — 1 while the 
second one only contains t — 1 itself. Therefore, we introduce the following 
approximation 



t-1 



t-1 



G{t, s) h{s) ^ X{t - 1) h{t - 1) = ^ G{t, s) h{s) 



(59) 



s=0 



s=0 



where we have defined X, h and G in this way. The approximated matrix 
G now has a simple form 



G 



/ 

X(0) 
1(1) 














: 

X{t-l) Q J 



(60) 



\ 

This approximation reduces the expression for the local field (|53)) to 

h{t +1) = im{t + l) + X{t) h{t) - i m{t) + aa{t) + y/a4>{t + 1) . (61) 

Furthermore, the modified noise equation 1)531) simplifies to 

^{t) = m-X{t-l)f,{t-l) (62) 

and the variance of the noise itself H58|) becomes 

Z)(i + 1, t + 1) = 1 + X'^it) D{t, t) + 2X{t) B{t, t + 1). (63) 

Another consequence of this approximation is that the discrete part in the 
local field (|2U|) can be written down or, in other words, the retarded self- 
interaction R can be calculated explicitly to be 



t-1 



(64) 



Comparing the recursion relations ()18() and (|61() for the local field, those 
for the residual overlap, ()15|) and ()62j) . and the ones for the variance of the 
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residual overlaps, and we find that they are formally the same by 
taking 



(65) 



Next, we would like to know what is the physics behind this approxi- 
mation. In fact, all solutions to H59|) can be called k-th order short-memory 
approximations because they approximate the feedback 



These approximations take into account that responses, in general, decrease 
very fast as a function of time. As we will show, the first order approximation 
obtained for k = 1 and simply called the short-memory approximation, 
corresponds to the SNA 



Y,G{t,s)h{s)r.G{t,t-l)h{t-l), x{t) = G{t + l,t). (67) 



We remark that these approximations all have a different x{t) so that we 
can use the latter in order to distiguish between them. We calculate the x(0 
corresponding to A; = 1, starting from the alternative stochastic description 
of the GFA (forgetting about the average over the initial conditions and 
condensed pattern). We find 




Y,G{t,s)h{s)r. G{t,s) h{t-l). 



(66) 



, 3 — t — k 
\ s>0 



s=0 




r (' dr^{s)d^{s) \ 
J det(2^£)) J 
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Recalling the distribution of the local field at time t ()49() . and taking the 
temperature to be zero (which means vanishing 7), this expression already 
resembles the analogous expression H'23p in the SNA approach. Furthermore, 
one sees that r]{t — 1) does not occur in the integrand of this expression. As 
a consequence, one can show that the matrix D can be replaced by 



D{s,s') = (r?(s)?7(s') 



or in more detail 

/ D{0,0) D{0,1) 
0(1,0) D{1,1) 



D 



s,s' ^ t — 1 



D{0,t- 2) 
D{l,t-2) 



(68) 



D{0,t) \ 
D{l,t) 



D{t- 2,0) D{t- 2,1) 
\ D{t,0) D{t,l) 



D{t-2,t-2) D{t-2,t) 
D{t,t-2) D{t,t) J 



(69) 



One can then verify that G{t + l,t), for zero temperature, is then exactly 
the same as the expression for x(t) in the SNA approach, viz. 



G{t + l,t) 



( 



n 



d7]{s) 



det(27ri:)) 

I 



exp 



V 



\ ^{s)d\s,s')^{s') \ 5{h{t)) 



(70) 



In conclusion, compared with the exact GFA method we find that the SNA 
approach is a short-memory approximation, in which the response from 
earlier times is not taken into account. 



4.2 Discussion 

The origin of the approximation inherent in the SNA approach as discussed 
above lies in the treatment of the residual noise f^{t) (recall ()14|)'). We 
have assumed that after taking out the term N~'^/'^S^'^r^{t) of the local field, 
and h^{t) are only weakly correlated, and therefore we have applied the 
CLT to find that f^{t) converges to a normal distribution. Comparing with 
the GFA it appears that we took out only part of the correlations between 
ai{t) and those coming from the previous time step. In a fully connected 
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system however, there are not only feedback loops of length 1 and 2, but of 
arbitrary length, although they may be less probable j25| . 

Therefore, one would expect the SNA method to approximate the dy- 
namics already from the second time step onwards. However, we can show 
that G{t,t — 2) is zero such that the second time step is still exact. More- 
over, some of the order parameters in the third time step, e.g., the overlap 
only involve noises up to the second time step, so that they are also correct 
for the third time step. 

In order to show that G{t, t — 2) = we proceed as follows (recall 

G(M-2) . ^ 



de{t - 2) 

= /3{a{t)[a{t-l)-tanli{/3h{t-2))])^ . (71) 

Considering the effective path average the sum over a{t) can be done ex- 
plicitly 

E '«2^SW3T)='-'W-l)). (72) 

(T(t)=±l 

Moreover, the expression for the distribution only contains summations over 
cr(0) up to a(t — 1) and there is only one term left that contains a{t — 1) so 

^ ""-" 2coshW-2)) -*'°"'''^''"-^»- 

Remark that further sums over spins can not be done since h(t — 1) contains 
the spins at times s < t — 2. This shows that G{t, t — 2) = 0. 

4.3 Accuracy in retrieval 

First, for a = 0, one can easily verify that the GFA analysis yields 

/ ... \ 

x(o) ... 
x(l) ••• (74) 

: : : 

V x(i-l) / 

implying that the short-memory approximation, i.e., the SNA is exact. The 
reason is that the local field at time t does not depend on previous times 
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because the terms in the retarted self-intereaction and the noise are propor- 
tional to a. The only correlation that remains in the system comes from 
the dependence of the spins on the local field at the previous time step. 
Consequently, we expect the SNA to be a very good approximation to the 
exact dynamics for small loading capacities. Some extra reasons to do so 
is that for small loading capacities, the convergence time to the attractor is 
very small, only a few time steps, and the SNA is exact up to time step 3. 
Moreover, to write down the evolution equations for the order parameters 
like the overlap one performs averages over the noise. Therefore, not surpris- 
ingly, the SNA results for the first few time steps coincide with numerical 
simulations as has been reported in the literature before. Only discrepancies 
of the order 0(10-^) that are of the same magnitude as the finite size effects 
in the simulations, and hence not conclusive, have been observed (see, e.g., 
jl2j and references therein). 

Analytically, of course, one notices the difference from the fourth time 
step onwards. Starting from the exact GFA approach and writing the ex- 
pression in a convenient form for comparison, we get at zero temperature 



The only difference with the SNA result 1)291) is in the term that is underlined. 
It is present due to the fact that beyond t = 3, i? is not simply given by 
(|64|1 . We will show numerically in the next subsection that this difference 
is small (e.g., of the order of 0.3% upto 3% for T = 0.1, a = 0.1, mo = 0.4 
respectively mo = 0.2). It will be interesting to see how this difference 
behaves for further time steps. 

4.4 Numerical results 

Now that we know precisely what the origin is of the approximation inher- 
ent in the SNA method as usually applied, we can check numerically how 
accurate it is for retrieval and also for spin-glass behaviour, although from 
the point of view of neural networks one is not primarily interested in the 
latter. 



m(4) 




+ax{2)x{l) g{^m{0) + y^aD{0)x) 
+ax(2)x(l)x(0)a(0)+aG(3, 0)a{0) 




(75) 
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Figure 1: Simulations for the distribution of the residual overlap, r{t), for 
different time steps compared with their theoretical distribution as calcu- 
lated using the SNA (full curve). The picture on the left is for a = 0.06 
while the one on the right is for a = 0.16 corresponding to retrieval respec- 
tively spinglass behaviour. Both pictures where made for T = .1, tuq = .6 
using a finite size simulation with N = 3000 {N = 2000) for the left (right) 
figure and averaging over 200 samples. 



We first want to compare the limiting normal distribution of in the 
SNA approach with simulations for different time steps. This is done in 
Fig. 1 for two representative values of the capacity, i.e. a = .06 which lies 
in the retrieval phase and a = .16 which is above the critical capacity and 
thus within the spinglass phase. 

We conclude that in the retrieval region (oc < .135 for T = .1) the 
simulation results coincide quite good with the limiting SNA distribution, 
while in the spin-glass region, the results for t = 3 start to divert system- 
atically. This is consistent with the results obtained in the fully connected 
Blume-Emery-Griffiths neural network |^. Remark that the distribution of 

starts to divert for t = 3, while m(3) is still exact as discussed before. 

We also want to compare the evolution of the order parameters. As a 
typical result we show in Fig. 2 the overlap m{t) as a function of time using 
the method in precisely in order to avoid finite size effects. We see that 
the SNA results coincide with those of the GFA up to the third time step, as 
shown analytically. For further time steps the results are strongly dependent 
upon the parameters of the network determining its behaviour: retrieval or 
spin-glass behaviour. The parameters chosen in Fig. 2 are a = 0.1, T = 0.1, 
and mo = 0.4 respectively rriQ = 0.2. For the first choice (the two upper 
curves) the system evolves to the retrieval attractor and one observes that 
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Figure 2: The overlap order parameter as a function of time for the SNA 
versus the GFA approach for a = 0.1, T = 0.1 for retrieval behaviour 
niQ = 0.4 and spin-glass behaviour niQ = 0.2. The thin line represents the 
stationary limit for retrieval. 



there is only a marginal difference between the SNA and GFA method. This 
confirms the previous observations made in the literature ( ^2] and references 
therein). For the second choice (the two lower curves) the initial overlap is 
too small such that we are outside the basin of attraction for retrieval and, 
hence, we do not evolve to the attractor (spin-glass behaviour). Here the 
SNA method, as used in the literature, does not give good results for further 
time steps. We remark that we have also compared these results with finite 
size simulations. These simulations are not indicated on the figure because 
they lie almost exactly on the GFA lines. 



In this section we show how to treat the feedback correlations in the SNA 
approach correctly. Looking at the definition of the residual overlap at time 
t, i.e.. 



5 The SNA revisited 




(76) 
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we want to know how the dependence between spins and patterns evolves 
in the course of the dynamics. 

In general, the local field at a certain time step hi{t) depends on 
in two ways. First, there is the correlation through the occurrence of 
in the coupling matrix {Jij). We call this type of correlations first order 
correlations because they are the most obvious ones. Second, there may 
be so-called second order correlations, being correlations with pattern 
through the dynamics as a result of feedback. Second order correlations can 
only appear due to first order correlations earlier in the dynamics. Therefore, 
extracting the first order correlations at every time step results in a system 
that does not depend on anymore. 

We define Shf{s) by 



1 



(77) 



In this way, adding 6hf{t) removes the first order correlations from the local 
field at time t, i.e., 

h^;{s) = h,{s) + 6h';{s) (78) 

such that hf{s) only depends on in second order. Moreover, 6hf{s) tends 
to zero as in the thermodynamic limit, and can therefore be regarded 

as a perturbation to the field hi[s). We apply this perturbation for all time 
steps s <t and find 



5hf{s) + 0{l/N) 



(79) 



(s)=0 



where o"^(i) is now completely independent of i^^. We rewrite the derivatives 
by introducing an external field 9i{s), 



<5/if(s)=0 

Inserting (|77|) in (|79j) and using (|8fl|) yields 



6»,(s)=0 



<t) = <7f (i) + 



06 As) 



{s) + 0{l/N). 



(80) 



(81) 



6»,(s)=0 

We multiply both sides by ^f/\/iV and sum over i, so that 



' N 



1 



E 



• N 



(s) + 0(l/ViV) (82) 



ei(s)=o 
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by using the definition of r^{t). We now take the hmit N ^ oo. By 
construction, the first term on the r.h.s. is a sum of independent terms and 
thus converges to a normal distribution with zero mean and variance 1. To 
the second term we apply the LLN. The result of the limit reads 

r^{t) =f^'{t) + Y,G{t,s)r^'{s). (83) 

s 

This recursion relation for r^{s) corresponds to by using the sub- 
stitution for r] and for (p as in Finally, we insert this relation 
into the expression ()18() of the local field at time t + 1 leading to, after some 
algebra, the expression for the exact local field as in equation (|53|) for the 
GFA. This shows that we have found all feedback correlations. 

6 Other architectures 

In this section we shortly discuss the application of the SNA approach to 
other architectures. 

For the extremely diluted symmetric architecture one has found an exact 
solution for the dynamics up to time step 3 by using a probabilistic method 
analogous to the SNA and comparing it with the generating functional 
approach PHI. In this case, the effective local field, starting from the GFA 
is given by 

h{t) = ^ m{t) + a ^ R{t, s) a{s) + \/ai]{t) (84) 

s=0 

with rj(t) a temporally correlated noise with zero mean and correlation ma- 
trix D. Hereby, 

D = C R = G. (85) 

It is clear that, besides the simplification in both the correlations and re- 
tarded self-interaction, feedback correlations of arbitrary length survive and 
make the dynamics as hard to solve as the one of the fully connected model. 
Hence, a completely analogous discussion as the one before can be made in 
this case. 

For the extremely diluted asymmetric architecture the effective local field 
in the GFA approach is given by ()84p. where now 

D = C R = 0. (86) 

Hence, the retarded self-interaction is zero, telling us that the local field 
at time t does not directly depend on the spins at previous times but only 
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indirectly via the noise. This is precisely the reason why for a = 0, the short- 
memory approximation gives the exact answer (see above). It amounts to 
having a response function of the form ()74() . Therefore, the SNA describes 
the correct dynamics. The reason is that for asymmetric dilution the prob- 
ability to have a loop of finite length tends to zero in the thermodynamic 
limit UHllin]. 

For sequence processing networks the GFA effective local field is given 
by (IHH) with 

D = ^(GtfC(G)" R = (87) 

ra>0 

and the situation is analogous to the one for the asymmetrically diluted 
model in the sense that the retarded self-interaction is zero. Again we have 
a response function of the form H74|) and, hence, the short-memory approx- 
imation is exact. This is consistent with and further explains the results 
in 0] where it has been shown through explicit calculation that the order 
parameter equations obtained through the GFA are equivalent to those of 
statistical neurodynamics. 

7 Concluding remarks 

In this paper we have revisited the signal-to-noise approach for solving the 
dynamics of the fully connected Little-Hopfield model by comparing it with 
the exact generating functional analysis. In order to do so we have derived 
a recursion relation for the effective local field in the generating functional 
approach. We have shown that the signal-to-noise analysis is a short-memory 
approximation that is exact up to the third time step. For further time steps, 
it stays very accurate in the retrieval region but not in the spin-glass region. 
These results are confirmed by numerical simulations. The application of 
these methods to other architectures and to sequence processing models has 
also been discussed. 
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